Automatic discourse structure generation using rhetorical structure theory
نویسنده
چکیده
This tiiesis addresses a difficult problem in text processing: crealing a System lo automalically dérive rhetorical structures o f text. Allhough thè rhelorical structure lias proven to be useful in many fields o f text processing sucli as text summarisation and information extraction, Systems that auiomalically generate rhetorical structures with high accuracy are difficult to find. This is bccause discourse is one of the biggest and yet least well defined arcns in linguistics. A n agreement amongst researchcrs on the best method for nnalysing thc rhetorical structure of text lias not been found. ' . This thcsis focuscs on investigaliug a method lo generate the rhetorical structures of text. By exploiting différent cohesive devices, it proposes a method to recognise rhetorical relations belween spans by checking for thc appearanec o f thèse devices. Thèse factors include eue phrases, noun-phrase eues, verb-phrase eues, référence words, time références, substitution words, ellipses, and syntaclic information. The discourse analyser is divided into tvvo levels: sentence-level and text-level. The former uses syntactic information and eue phrases to segment sentences into elementary discourse units and to generate a rhetorical structure for each sentence. The latter dérives rhetorical relations between large spans and theii replaces each sentence by its corresponding rhetorical structure to produce the rhelorical structure o f text. The rhetorical structure at the text-level is derived by selecting rhetorical relations to connect adjacent and non-overlapping spans to form a discourse structure that covers the entire text. Constraints o f texlual organisation and textual adjacency are effectively used in a beam search to reduce the search space in generating such rhetorical structures. Expèriments carried out in ihis research rcccivcd 89.4% F-score for the discoursc segmentation, 52.4% Fscore for thc senlence-levcl discourse analyser and 38.1% F-score for the final output o f the System. Il shows that this approach provides good performance cumparison with current research in discourse.
منابع مشابه
The Prosody of Discourse Structure and Content in the Production of Persian EFL Learners
The present research addressed the prosodic realization of global and local text structure and content in the spoken discourse data produced by Persian EFL learners. Two newspaper articles were analyzed using Rhetorical Structure Theory. Based on these analyses, the global structure in terms of hierarchical level, the local structure in terms of the relative importance of text segments and the ...
متن کاملQuery Focused Summary Generation System using Unique Discourse Structure
In this paper, the authors propose a query focussed summary generation system which is constructed on top of a unique language-independent discourse structure. The discourse structure is comprised of three text representation techniques, namely, Universal Networking Language (UNL), Rhetorical Structure Theory (RST) and saṅgatis. The discourse structure is indexed based on a concept called sūtra...
متن کاملThe Rhetorical Parsing, Summarization, and Generation of Natural Language Texts
This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structures automatically. The thesis proposes a rst-order formalization of the high-level, rhetorical st...
متن کاملLogic-Based Rhetorical Structuring for Natural Language Generation in Human-Computer Dialogue
Rhetorical structuring is field approached mostly by research in natural language (pragmatic) interpretation. However, in natural language generation (NLG) the rhetorical structure plays an important part, in monologues and dialogues as well. Hence, several approaches in this direction exist. In most of these, the rhetorical structure is calculated and built in the framework of Rhetorical Struc...
متن کاملAbstract Generation Based On Rhetorical Structure Extraction
generation is, like Machine Translation, one of the ultimate goal of Natural Language Processing. However, since conventional word–frequency– based abstract generation systems(e.g. [Kuhn 58]) are lacking in inter-sentential or discourse-structural analysis, they are liable to generate incoherent abstracts. On the other hand, conventional knowledge or script–based abstract generation systems(e.g...
متن کامل